In this session we will talk about probability.
Remember, there are two areas of applied biostatistics: descriptive statistics and statistical inference.
Last session we spent a fair amount of time talking about descriptive statistics. We talked about numerical summaries and graphical summaries of data. We talked about different numerical summaries and different graphical summaries for continuous, categorical, ordinal, and dichotomous variables. Once you have your sample described, you then want to make generalizations about the population based on what you are seeing in the sample.

Probability is going to come into play when we make those generalizations. In practice we have one sample to analyze, but in theory there are many possible samples that could arise from a population. Again, we'll use capital "N" to represent the population size and lower case "n" to represent the sample size.

This picture shows that in a large population there are many possible samples that could arise. Before we get to probability, let's think about how that actual sample came about. There are different ways to produce samples. The first and simplest is a simple random sample, and the way you produce this is to first enumerate or list all members of the population, and that list is called the sampling frame. You then select the number of individuals for your sample at random, such that each person has the same chance of being selected.

For example, if you have a population of size 100 and you want to select one person, then every person on that list has the chance (one out of 100) of being selected.

Another kind of sample is called a systematic sample. You start at the same place, that is, with the sampling frame or the listing of all members of the population. You then determine a sampling interval. 
Say in this case the population size is 1,000, and you want to select 100 people. The sampling interval would be 1,000/100 or 10, and what you want to do is select every tenth person in the list, but the first person is selected at random from among the first ten. So you select someone at random from the first ten, maybe it’s the fifth person, and then you take every tenth person from that point on. 

Another sampling strategy is one that uses what is called "stratified sampling". Here you organize the population into mutually exclusive strata, or groups, and you select people at random from within each. So you might use stratified sampling if, say for example, you wanted to make sure that you put together a sample that had appropriate representation of people from different racial or ethnic backgrounds. 
You would stratify people according to their background and then choose people at random from within those groups. A convenience sample is different from the others in that it is called a non-probability sample. People aren't selected with a pre-specified probability in mind. A convenience sample, just by the name, is one that you can convene by some convenient means. So, for example if you've ever been stopped in a mall by a person conducting a survey, they are collecting a convenience sample. They're asking the survey questions, but you are part of a convenience sample.

A quota sample is one in which you select a predetermined number of people into the sample from specific groups. For example, you might have in mind that you want to produce a sample of size 100, and, based on the population, you know that half of the sample is male and half is female. So, in your sampling you might select individuals to participate until you have 50 men and 50 women.

Now we will talk about some basics of probability, and I use the term basics loosely because this is the starting point, but it can get quite complicated.

So, first, a probability reflects the chance or likelihood that some outcome will occur. Probabilities are numbers between 0 and 1, inclusive. A probability of 0 means an outcome is never going to happen. A probability of 1 means it is certain to happen. 

Calculating a probability involves counting, and we will use that little formula to determine probabilities. A probability is the number of individuals who have a particular outcome of interest divided by the population size. So, if we are selecting a single individual from a population, the probability that we select Jane Smith, for example, from a population of 100 people would just be one over 100, assuming there is only one Jane Smith. 

Often we're not concerned with selecting a particular individual into a sample, but rather the probability of selecting certain kinds of people. So we might be interested in the probability of selecting a female, for example, where 
I'm using the word "event" here to represent a characteristic of the individuals. So the probability of selecting a female would be the number of outcomes, or the number of individuals who are female, over the population size. We will use this second formula over and over in our examples.

So let's take a very small population of 6 people. We will identify them with a unique identification number. What we know about them is their gender and their age in years.

Now, if I select a person at random, the probability of selecting person #1 is the same as the probability of selecting person #2, et cetera to person #6, and they are all equal to 1 out of 6.

Again, we're not usually concerned about selecting individuals into our sample, but instead we may want to select certain kinds of individuals. So, I will use capital letters A, B, and C to represent different characteristics of these individuals: A where a female is selected, B where a male is selected, and C where a subject is over the age of 65. On the right side I'm listing the identification numbers of the individuals who meet those criteria.

So, persons 2, 4, and 6 are female, and 1, 3, and 5 are male. And 5 and 6 are over the age of 65. So, using that probability formula, number of people with the outcome divided by the population size, the probability of selecting a female is 3 out of 6. The probability of selecting a male is 3 out of 6. And the probability of selecting someone over age 65 is 2 out of 6. It is fine to leave probabilities as fractions, but if you want to convert them into decimals, that's fine too.

Two events or defining characteristics are said to be complementary if the complement of the event includes all outcomes that are not in the event. So event A represented selecting a female. So, a complement would be selecting a male. The individuals who would comprise the complement would be 1, 3, and 5. Those are the identification numbers of the males. The probability of a complement, that is, the probability that I don't select a female is exactly 3 out of 6. I'm counting how many individuals are part of a complement divided by 6. 

There is a rule to figure out this probability, and the complementary rule says that the probability of a complement is just 1 minus the probability of the event. Well, the probability of A was 3 out of 6, and 1 minus that gives me 3 out of 6 again. Now, when you calculate probabilities, you can use rules, like this one, or you can use that simple formula. Just count up how many individuals have the event or characteristic of interest divided by the total population size. So, as an example, the probability of C complement, that is the probability I select a person 65 or younger would be 4 out of 6.

Two events are said to be mutually exclusive if they have no outcomes in common. So, A represented selecting a female. That was persons 2, 4, and 6. B represented selecting a male, persons 1, 3, and 5. A and B are mutually exclusive because they have no outcomes in common. Event C was selecting a person over the age of 65. A and C are not mutually exclusive, because person 6 is in both. B and C are not mutually exclusive, because person 5 is in both.

Now let's consider a slightly more realistic example. Here we have a population of 208 patients who are seeking care at an allergy clinic. What we know about them is their gender in the rows of the table, and their highest level of education in years. So someone is in the 0-8 column if they completed at most up to 8 years. Someone is in the 9-12 column if they did any high school or they completed high school, and so on.

So, now we're going to select a person at random from this populated and calculate some probabilities. First, the probability that I select any person from the population is 1 out of 208, because 208 is the total number possible.

Here are some practice problems. The probability that I select a male from this population would be 78 out of 208, because there are exactly 78 males out of 208 possible selections. I can leave that as a fraction or I can divide and say the probability of selecting a male is 0.375. Now, there are two interpretations of that number. The first is that it is the probability of selecting any male from this population. Another way to think of it is that 37.5% of the population are male.
 
Let's take the second one - the probability that I select a person who has between 9 and 12 years of education. Well, there are exactly 62 out of 208 people with 9 to 12 years of education.

Next is the probability of selecting a female who has 9 to 12 years of education. So think of the rule: how many people satisfy the condition over the number possible. There are exactly 42 females who have 9 to 12 years of education, divided by 208.

The next one - 17 or more years of education and male. There are exactly 26 meeting those criteria out of 208. 

And last, the probability of selecting someone who has at most 12 years of education. Here you must think of what "at most" means. That would be anyone who has 0 to 8 or between 9 to12 years. So, 45 plus 62 people satisfy the numerator out of the 208 possible. Again, for each of these I'm determining how many people meet the specified criteria out of 208. Each of the probabilities that we just computed is an unconditional probability. In each case the denominator was 208, the total number of people in the population.

I now want to talk about something called conditional probability. A conditional probability is a probability that looks at a particular subgroup, not the whole population.
You compute probabilities of certain events conditioning on or restricting your attention to a certain criterion.
So, for example, we could ask, what is the probability of selecting a male, given that I am focused just on people who have 17 or more years of education. That vertical line is read as "given". So, what I'm asking here is, what is the chance I select a male, given that I am focused on the rightmost column of the table, that is, people with 17 or more years of education. 

Sometimes it is easier to plug in the denominator first. There are exactly 53 people with 17 or more years of education. From that subgroup 26 of them are male. 
So, this probability is 26 out of 53, which can also be expressed at 0.491. So approximately half of the people with 17 or more years of education are male. The first problem we did on the last slide asked for the probability of selecting a male, and not looking at any particular education level, but just over the whole population, and we computed that to be 78 out of 208.

To compute conditional probabilities, you can do it the way we have done all of the probabilities which is to determine how many people satisfy the condition of interest divided by the number possible. Now with all the unconditional probabilities that denominator was 208. In the last example where we asked the probability of selecting a male, given we were restricting our attention to people with 17 or more years of education, the denominator became 53. It's perfectly fine to use that same approach. Divide the number of people who meet the criteria by the total number possible to compute the conditional probability. Or, there is a conditional probability rule, and that is laid out here under the definition.

The probability of A given B is equal to the probability of (A and B) divided by the probability of B. In our example we were looking for the probability of selecting a male, given that the person had 17 or more years of education. The definition would say that is equal to the probability of selecting a person who's both male and has 17 or more years of education divided by the probability that someone has 17 or more years of education. Now each of those two probabilities are unconditional probabilities, and the denominator is 208 for each. So, the probability of selecting of selecting someone who is both a male and has 17 or more years of education is 26 out of 208, and the probability of selecting someone with 17 or more years of education is 53 out of 208. When you divide, you get 26 out of 53. In other words, you get exactly the same answer using the conditional probability formula that you would get using the simple rule asking how many people meet the criteria and dividing by the total number possible.

One last concept is the concept of "independent events." Two events are said to be independent if the probability of one is not influenced by the occurrence or non-occurrence of the other event. Look at this data table for a population with 210 people, similar to what we were just analyzing, but the numbers have been rounded a bit. Again, it shows their gender and highest level of education in years. If we use this table and compute the probability of selecting a male, it would be 105 out of 210, or 0.5. Suppose we again ask: What is the probability of selecting a male, given that the person has 17 or more years of education? Well, looking just at those with 17 or more years, there are 50 or them and exactly 25 are male. So that is also 0.5. For this example the person's gender and educational level are independent. Knowing a person's educational level doesn't tell you anything about the likelihood of selecting a male. So for these data, gender and educational level are independent.

Now, there is a rule for determining whether two events are independent. And we will just use A and B. A and B are said to be independent if the probability of A is equal to the probability of A, given B. So, if an unconditional probability is equal to a conditional probability. And it works in that direction and in the reverse. If the probability of B is equal to the probability of 
B, given A, then A and B are said to be independent.
A third condition is that the probability of A and B will be equal to the product of the probability of A and the probability of B.

To check to see if two events are independent, you can check any one of these three. If one is equal, they will all be equal. If one is not equal, then none of them will be equal.

So, for the real data, that is the data with 208 people, we computed the probability of selecting a male to be 0.375. We then calculated the probability of being a male, given that the person had 17 or more years of education, and we determined that to be 0.491. So, the probability of male is not equal to the probability of male, given 17+ years of education. Thus, sex and education are NOT independent. There is a relationship. In the full population just over a third of people are male, but when you focus on people with the highest educational level, almost half of them are male. So, there is a relationship between years of education and male sex.

Here's another example. Look at this table, and we see whether people had cardiovascular disease (CVD) and whether they had a family history of CVD shown in the rows of the table. Are family history of CVD and an individual's current status regarding CVD independent? To figure this out, we need to do some calculations. First, looking at the table, the probability that someone current has CVD is 40 out of 345. 
So, just about 12% have cardiovascular disease. Now focus on people with a family history in the second row of the table. What is the likelihood that a person has CVD, given a family history? Looking at the second row of the table, there are 
15 people with CVD out of 105 with a family history. That probability is just over 14%.

What about the probability of CVD given no family history? Well, now we focus on the first row of the table. There are 243 without a family history and among them 25 have CVD. So about 10% of that group has CVD.

So, what is the relationship between family history of CVD and current CVD? Well, the unconditional and conditional probabilities are not equal. People with a family history of CVD are more likely to have CVD, and there is a relationship there. Those with a family history have an increased probability of having CVD.

Here is an example showing 120 men participating in a study evaluating a screening test for prostate cancer. The screening test measures prostate-specific antigen (PSA) in the blood. The levels of PSA are shown in the rows. Men have low, moderate, or high levels of PSA. In addition to having the screening test, the men agreed to have a biopsy indicates whether they actually have prostate cancer or not. So, the table shows both the screening test results and whether they had prostate cancer.

What is the probability of prostate cancer given a low PSA? The low PSAs are in the first row of the table, and 3 out of 64 with prostate cancer. So, just about 5% of the men with low PSAs have prostate cancer. Looking at those with moderate levels of PSA, 13 out of 41 or about 32% have prostate cancer. And in the group with high PSAs 12 out of 15 or about 80% have prostate cancer. Is this screening test a useful test for screening for prostate cancer?

This concept of conditional probability is used a lot in evaluating screening tests. Screening tests are not diagnostic tests. In medicine and public health screening tests are used to identify people who might be at risk for particular conditions. If they screen positive, they will go on to have further testing to either confirm the disease or rule it out.

Screening tests are evaluated using conditional probability, and the data are often laid out as presented in this table, that is, the screening test is classified as positive or negative. Now, the PSA test actually had 3 levels. But many screening tests are classified as positive or negative. The columns of the table indicate whether the disease was present or absent.

Here, I'm just using letters: a, b, c. d, but we will write out different summary measures based on this type of data. 
People often want to know the "sensitivity" of a screening test, which is also called the true positive fraction. The sensitivity of a test is the probability that a person will test positive, given that they have the disease. Sensitivity is a probability, so it is between 0 and 1. You would want people who have the disease to have a high probability of screening positive.

On the other hand, specificity represents the true negative fraction. It is the probability that someone test negative, given that they do not have disease. So, you would like a test with high sensitivity and high specificity. The false positive fraction is the probability that someone who does not have disease tests positive. So, this represents a mistake. The false positive fraction represents the probability that someone without disease tests positive.

The false negative fraction is the other error that can be made, and it is the probability that someone who actually has the disease will test negative. Which error is worse depends on the disease that you're looking at. It might not be horrible if someone without the disease tests positive when they go on to further testing to rule out the disease. However, if you are dealing with a serious disease like HIV, you would not want to have a high false positive fraction, because that would cause lots of stress in a person who tests positive, even if you tell them this is only a screening test.

False negative fractions also have implications, and you have to think about the disease you are dealing with and the implications of these errors.

These 4 measures: the sensitivity, specificity, false positive fraction, and false negative fraction are referred to as the performance characteristics of a test. What you as a patient want to know is the positive predictive value, that is, "what is the probability that I have the disease, given that I had a positive test. I get my results, and the doctor says that I tested positive. I want to know if I'm really likely to have the disease or not so likely. 
You also want to know the negative predictive value. What is the probability that you don't have the disease, given that you tested negative?

If you get a negative test result, should you be completely relieved if there is a high probability that you don't have the disease? These are things that you want to know as a patient.

Here is an example of a screening test looking at whether a pregnant woman's fetus has Down syndrome. This screening test is either positive or negative based on levels of hormones in the blood, and if positive, women also undergo an amniocentesis to determine definitively whether the fetus is affected. Here, there were 4,810 women who had both the screening test and the amniocentesis. From these we can estimate the performance characteristics of the test. 
For these data the sensitivity is the probability that a woman tests positive, given they have an affected fetus. 
That would be 9 out of 10, or 0.9.

The specificity is the probability that a woman tests negative, given that they have an unaffected fetus.
That is 4,449/4,800 which equals 0.927.

The false negative fraction is the probability that someone tests negative, given that they have an affected fetus. So, this is an error. One in ten women with an affected fetus will have a negative test result. Notice that the sensitivity and false negative fraction are complements of one another. If you know the sensitivity, you will know the false negative fraction, because one is just 1 minus the other. The false positive fraction is the probability that a woman tests positive given that she has an unaffected fetus. That would be 351 out of 4800 or about 7%.

Specificity and the false positive fraction are complements of one another. If you know one, you can compete the other. 
You really don't need to report all four of these, because if you know two, you can figure out the other two.  Often, when people summarize a screening test, they report the sensitivity and the false positive fraction, what I have listed here as the first and last. Given those, you can compute the others. So, is this a good test?

Well, 90% of women whose fetuses have the disease will have a positive test. That's good. Ideally, we would like it to be 100%, but it is a screening test. The false positive fraction is about 7%, meaning that in 7% of cases with no problem the screening test is positive. Well, a lot of women get these tests in pregnancy might be told their baby may have Down syndrome. Maybe 7% is too high a false positive fraction. There is no right answer in terms of how high the sensitivity must be and how low the false positive fraction must be. 
It depends on the nature of the condition you are dealing with.

One more application of probability is something called Bayes theorem, and it is an extension of the idea of conditional probability. The idea here is that you revise or update probability estimates based on additional information. When using Bayes theorem, we talk about a "prior probability" which is an initial estimate. It is your initial estimate of the probability that someone has a particular condition. 
Then, you gather more information and use that additional information to revise or update your prior probability and produce what is called a "posterior probability." This comes up a lot in medicine, where given a person's presentation of symptoms, we might have an initial estimate of their probability of disease. After doing some testing, blood tests or other procedures, we might use that to update the probability that they have a particular condition.

And here is what Bayes theorem looks like. There is a simpler version at the top and a more involved one below, but they are equivalent. The top one says that the probability of A given B equals the probability of B given A times the probability of A, divided by the probability of B. What that means is that we want to know a conditional probability, that is, A given B. What we have available is B given A, is some ways the inverse. So we take that information and, with some mathematical manipulation, we can produce the conditional probability we are interested in. 

Let me show an example, and then we can cycle back to the formula. In Boston, 51% of adults are male. What we are doing is selecting at random to participate in a study. So, if we select one person at random from all adults living in Boston, what is the probability of selecting a male? Well, it would just be 0.51, because that's the composition of the population. Suppose we select our participant, but we don't yet know their gender. However, what we do know is that the person smokes. We also know that about 9.5% of males in Boston smoke as compared to about 1.7% of females. So, now, knowing that the person we selected smokes, and having some information about the relationship between smoking and gender, does that tell us more about the likelihood that we have selected a male?

So, what we want to find out now is the probability that the person is a male, given that the person is a smoker. So, with a little bit of notation here, we will use capital M to the probability of selecting a male. M complement would be female - not male. The probability that we select a smoker, given they are male is 0.095. This is known to us. And the probability we select a smoker given they are not male (i.e., a female) is 0.017. All of that information comes from the previous slide that was given.
 
The question is what is the probability we selected a male given that the person smokes? Well, Bayes theorem says that's the reverse of that conditional probability - the probability they smoke, given they are male, which we know from the first bullet, times the probability of selecting a male, which we also know, and in the denominator we have the probability they are male times the probability they smoke, given they are male, we have all of that information, plus the probability they are not male times the probability they smoke, given they are not male. Substituting the 4 figures at the top into the equation, we get 0.853. So, knowing that this person smokes dramatically increases the probability that it is a male. Before we know the smoking status, the probability that it was a male was 0.51. Knowing that the person smokes and that there is a relationship between smoking and gender tells us that, knowing that the person smokes, there is a much higher probability that we selected a male.

Let's go back to our example of screening tests and what we want to know as patients. Suppose we know that the likelihood that someone has a particular disease is 0.002. In other words, about 2 per 1000 people have the disease.

Let' say there is a screening test for this disease, and it has an 85% sensitivity. This is something that is reported in the literature. So, it's a good test with 85% sensitivity, meaning that people who have the disease have about an 85% chance of testing positive. We also know that about 8% of people who have this test will test positive, and the remainder test negative. I have the test, and it comes back positive. What's the probability I have the disease, given that I have the information about the test and I tested positive?

Before we get to that, what is the probability that I have the disease without having the test? Knowing nothing about the test, my chance of having the disease would be the same as anyone else's so it would be 0.002. So now I've had this screening test, and I want to know, what is my chance of having the disease given that I have a positive test? Well, what we know is that the probability of disease is 0.002, our sensitivity, and the probability that people test positive and negative. 

Bayes theorem at the bottom says the probability of disease given test positive is the probability of test positive given disease times the probability of disease over the probability that I test positive. All of these are known, so I plug them in, and I come up with 0.021. So, I've taken the test, it came back positive, so now my chance of having the disease is 2%. Well, initially, not having had the test it was 0.2%.
Having a positive test now increases the likelihood I have the disease ten-fold to an absolute risk or probability of 2%. Do I have the disease?

This is an interesting issue because lots of times people look at the sensitivity and say "Well, the sensitivity is 85%. Why isn't that boosting the probability much higher?" Well, it actually did boost it 10 times. But the likelihood of having the disease is so small to start with, that even having a positive test doesn't really increase my level of risk to the point that I need to worry.